A Mixture Clustering Model for Pseudo feedback in Information Retrieval

نویسندگان

  • Tao Tao
  • ChengXiang Zhai
چکیده

Information Retrieval (IR) refers to retrieving relevant documents from a large document database according to a user-submitted query, and is among the most useful technologies to overcome information overload. For example, Web search engines are now essential tools for everyone to find information on the Web. Indeed, search capabilities are becoming more and more popular in virtually all kinds of information management applications. Given a query, a retrieval system would typically estimate a relevance value for each document w.r.t. this query, and rank the documents in the descending order of relevance. Over the decades, many different retrieval models have been proposed and tested, including vector space models , probabilistic models , and logic-based models [6]. As a special family of probabilistic models, the language modeling approaches have attracted much attention recently due to their statistical foundation and empirical effectiveness [5, 2]. A particular effective retrieval model based on statistical language models is the Kullback-Leibler (KL) divergence unigram retrieval model proposed and studied in [4, 8]. The basic idea of this model is to measure the relevance value of a document w.r.t. a query by the Kullback-Leibler divergence between the corresponding query model and the document model. Thus the retrieval task essentially boils down to estimating a query unigram model and a set of document unigram language models. The retrieval accuracy is largely affected by how good the estimated query and document models are. In this paper, we study how to improve the query model estimation through fitting a mixture model to some number of top ranked documents, which are retrieved by the original query itself. We present a new mixture model that extends and improves an existing mixture feedback model and addresses its two deficiencies. We study parameter estimation for this mixture model, and evaluate the model on a document set with 160, 000 news article documents and 50 queries. The results show that using the new mixture model not only

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Query-Specific Pseudo Feedback Document Selection for Query Expansion

In document retrieval using pseudo relevance feedback, after initial ranking, a fixed number of top-ranked documents are selected as feedback to build a new expansion query model. However, very little attention has been paid to an intuitive but critical fact that the retrieval performance for different queries is sensitive to the selection of different numbers of feedback documents. In this pap...

متن کامل

Document Image Retrieval Based on Keyword Spotting Using Relevance Feedback

Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...

متن کامل

Recurrent Pseudo Relevance Feedback on Web Collections

Various Relevance Feedback techniques exist in Information Retrieval such as Simulated Relevance Feedback and Pseudo Relevance Feedback. In a Simulated Relevance Feedback technique a new query is reformulated based on the documents selected by the user from the top-ranked documents whereas in a Pseudo Relevance Feedback, the query is reformulated based on the assumption that N top-ranked docume...

متن کامل

Optimization of a Search Engine for an Organized and Effective Browsing

In web search applications, queries are submitted to search engines to represent the information needs of users. Discovering the number of diverse user search goals for a query and depicting each goal with some keywords automatically. In the existing work propose a novel approach to infer user search goals by analyzing search engine query logs. First propose a novel approach to infer user searc...

متن کامل

Relevance-based language modelling for recommender systems

Relevance-Based Language Models, commonly known as Relevance Models, are successful approaches to explicitly introduce the concept of relevance in the statistical Language Modelling framework of Information Retrieval. These models achieve state-of-the-art retrieval performance in the pseudo relevance feedback task. On the other hand, the field of Recommender Systems is a fertile research area w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004